Additional file 7. Overview of the Sprai algorithm and its performance

ثبت نشده
چکیده

The detailed algorithm of Sprai will be published elsewhere, but we give an overview of it in this study. Sprai is primarily designed for correcting sequencing errors in single-molecule sequencing reads, and therefore can be integrated with any analysis tools that accept long reads of high accuracy. Among numerous kinds of genome analysis, de novo genome assembly is one of the most common analyses, and for this reason we integrated Celera Assembler [1] with Sprai so that Sprai can be used as a de novo genome assembler. When the user selects the integration option, Sprai takes raw reads and outputs assembled contigs. Sprai may be used with other genome assemblers, but for now we have tested Sprai only with Celera Assembler. When Sprai is used as a de novo genome assembler, it works very similarly to the HGAP algorithm [2]. Both the algorithms take only Continuous Long Reads (CLRs) from PacBio RS as input and do not require reads from other sequencing platforms such as Illumina HiSeq. They take a set of longer reads in the input raw reads as " seed reads, " and these are error-corrected and assembled with Celera Assembler. They both align the entire set of input reads against the seed reads and then create multiple alignments from the similarities identified by the pairwise alignment. They both detect chimeric reads by finding regions covered by few alignments. The differences between Sprai and HGAP are as follows. First, the seed reads of HGAP are a set of reads longer than a threshold given by the user, whereas Sprai takes an arbitrary set of reads and chooses error-corrected reads up to the user-specified depth (20× by default, the same as the target depth for seed reads in HGAP). This strategy is particularly useful when the proportion of chimeric and low-quality reads, which is usually unknown prior to error correction, is relatively high. Second, Sprai finds similarities between raw reads using NCBI BLAST+ [3], whereas HGAP uses BLASR [4]. BLAST+ does not create a search index in memory, making Sprai scalable to thousands of CPU cores as long as the bandwidth of the file system is scalable. BLASR uses a 32-bit integer array for its search index, so that we have to split input raw reads into smaller chunks, and therefore, we need an additional step to merge results, a step that is not necessary for BLAST+. Third, Sprai …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improvement in WRP Block Replacement Policy with Reviewing and Solving its Problems

One of the most important items for better file system performance is efficient buffering of disk blocks in main memory. Efficient buffering helps to reduce the widespeed gap between main memory and hard disks. In this buffering system, the block replacement policy is one of the most important design decisions that determines which disk block should be replaced when the buffer is full. To o...

متن کامل

An Improvement in WRP Block Replacement Policy with Reviewing and Solving its Problems

One of the most important items for better file system performance is efficient buffering of disk blocks in main memory. Efficient buffering helps to reduce the widespeed gap between main memory and hard disks. In this buffering system, the block replacement policy is one of the most important design decisions that determines which disk block should be replaced when the buffer is full. To o...

متن کامل

An Overview of Group Key Management Issues in IEEE 802.16e Networks

The computer industry has defined the IEEE 802.16 family of standards that will enable mobile devices to access a broadband network as an alternative to digital subscriber line technology. As the mobile devices join and leave a network, security measures must be taken to ensure the safety of the network against unauthorized usage by encryption and group key management. IEEE 802.16e uses Multica...

متن کامل

Implementation of Face Recognition Algorithm on Fields Programmable Gate Array Card

The evolution of today's application technologies requires a certain level of robustness, reliability and ease of integration. We choose the Fields Programmable Gate Array (FPGA) hardware description language to implement the facial recognition algorithm based on "Eigen faces" using Principal Component Analysis. In this paper, we first present an overview of the PCA used for facial recognition,...

متن کامل

A redundancy allocation problem with the choice of redundancy strategies by a memetic algorithm

This paper proposes an efficient algorithm based on memetic algorithm (MA) for a redundancy allocation problem without component mixing (RAPCM) in a series-parallel system when the redundancy strategy can be chosen for individual subsystems. Majority of the solution methods for the general RAPCM assume that the type of a redundancy strategy for each subsystem is pre-determined and known a prior...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014